Probability Estimation with SVM - NISHIO Hirokazu's Scrapbox (Auto-translated from Japanese)

Probability Estimation with SVM

SVM is a binary classification algorithm that identifies whether the computed value f(x) is positive or negative.

It is not essentially a stochastic model. However, LIBSVM and sklearn implement stochastic inference. How does this work?

A Note on Platt’s Probabilistic Outputs for Support Vector Machines

Platt(2000)

https://gyazo.com/2d12317f6e5d60ce9fe55a6ac142962a

The story is that there are often applications where people want to know the probability of a label rather than predict it, so Platt suggested that it would be a good idea to approximate it with a sigmoid.

A, B are parameters, obtained by maximizing log-likelihood

In other words, it is equivalent to "learning logistic regression with the SVM output values as explanatory variables, rather than putting them into sign.

5-fold cross-validation

(Train SVM on 4/5 of the data, find f(x) for the remaining 1/5, and learn the relationship between f(x) and the correct answer using logistic regression).

In cases such as when the SVM cleanly separates the two, the logistic regression is naturally a step function

So I'm applying smoothing, etc.

Notes on using predict_proba in Scikit-Learn

Understand that "the algorithm that does not return probability values has logistic regression layered on top of it to make it return probability values."

Since SVM is trained 5 times on 4/5 data, well, it takes roughly 4-5 times longer.

Making the SVM return probability values does not increase the accuracy of the SVM's identification. That is not the purpose of the mechanism.

If the parameters are properly chosen, they will be of similar accuracy.

#Machine Learning

---

This page is auto-translated from /nishio/SVMで確率推定. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.